##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Genetic and epigenetic fine mapping of causal autoimmune disease variants
Supplemental table 1 has genomic coordinates of disease-associated SNPs.
## Using V3 as value column: use value.var to override.
We visualize clustering of disease-specific SNP sets based on the number of overlapping SNPs.
Out of all regulatory datasets, we select only TFBSs.
## [1] 1954 39
## [1] 1259 39
We check how regulatory similarity correlates with overlap similarity.
## x y
## x 1.00 0.28
## y 0.28 1.00
##
## n= 1482
##
##
## P
## x y
## x 0
## y 0
Next, we visualize heatmap of regulatory similarity.
Text mining question 1: Are the diseases within a cluster share stronger literature similarity than the diseases between the clusters? To answer, we need literature similarity scores for each pair, then split the pairs into cluster-specific groups and compare score distributions with what can be expected by chance, calculating the p-values for it. Expected answer: Diseases within each cluster are related to each other by literature findings stronger than could be expected by chance. Diseases between the clusters are not related to each other by literature findings, and this also may be statistically significant.
The top 10 pairs of disease-associated SNPs are most similar with each other.
##
## -----------------------------------------------------------------------------------------------
## Disease 1 Disease 2 Corr. coefficient
## ---------------------------------------------- ---------------------------- -------------------
## HDL_cholesterol Triglycerides 0.5484
##
## Kawasaki_disease Systemic_lupus_erythematosus 0.5352
##
## Bone_mineral_density Type_2_diabetes 0.5268
##
## Kawasaki_disease Multiple_sclerosis 0.501
##
## Kawasaki_disease Rheumatoid_arthritis 0.4775
##
## Celiac_disease Kawasaki_disease 0.4754
##
## LDL_cholesterol Triglycerides 0.4743
##
## Kawasaki_disease Ulcerative_colitis 0.4661
##
## Liver_enzyme_levels_gamma_glutamyl_transferase Urate_levels 0.4191
##
## Alzheimers_combined Bone_mineral_density 0.4149
## -----------------------------------------------------------------------------------------------
The similarity dendrogram can be divided into separate groups:
## Cluster01 has 8 members
## Kawasaki_disease
## Systemic_lupus_erythematosus
## Celiac_disease
## Ulcerative_colitis
## Psoriasis
## Multiple_sclerosis
## Rheumatoid_arthritis
## Allergy
##
## Cluster02 has 9 members
## Systemic_sclerosis
## Primary_biliary_cirrhosis
## Atopic_dermatitis
## Juvenile_idiopathic_arthritis
## Ankylosing_spondylitis
## Crohns_disease
## Type_1_diabetes
## Autoimmune_thyroiditis
## Primary_sclerosing_cholangitis
##
## Cluster03 has 10 members
## Urate_levels
## Liver_enzyme_levels_gamma_glutamyl_transferase
## LDL_cholesterol
## HDL_cholesterol
## Triglycerides
## Renal_function_related_traits_BUN
## Platelet_counts
## Red_blood_cell_traits
## C_reactive_protein
## Fasting_glucose_related_traits
##
## Cluster04 has 12 members
## Chronic_kidney_disease
## Alzheimers_combined
## Bone_mineral_density
## Type_2_diabetes
## Vitiligo
## Migraine
## Alopecia_areata
## Asthma
## Creatinine_levels
## Behcets_disease
## Progressive_supranuclear_palsy
## Restless_legs_syndrome
##
The “Enrichment 1/2” columns show the average p-values of the group-specific SNPs-regulatory associations. A “-” sign indicates that an association is underrepresented. The “p-value” column shows whether the difference in the associations between the groups is statistically significantly different.
## [1] "c1 vs. c2 , number of degs significant at adj.p.val<0.5: 54"
##
## ---------------------------------------------------------------------------------------------------------
## Row.names c1 c2 adj.P.Val V2
## -------------------------------------------------- --------- ------ ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.0006292 0.8227 0.0001666 GM12878 RUNX3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeSydhTfbsGm18951NfkbTnfaIggrabPk 0.008529 0.9186 0.0002231 GM18951 NFKB IgG-rab TNFa
## ChIP-seq Peaks from
## ENCODE/SYDH
##
## wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1 0.001167 0.6894 0.0002231 GM12878 FOXM1 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeSydhTfbsGm19099NfkbTnfaIggrabPk 0.008903 0.9673 0.0002231 GM19099 NFKB IgG-rab TNFa
## ChIP-seq Peaks from
## ENCODE/SYDH
##
## wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2 0.0001479 0.6107 0.0002929 GM12878 PML v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1 0.0008332 0.8376 0.0002929 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep1 0.003957 0.942 0.0003906 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1 6.767e-05 0.4192 0.0004104 GM12878 NFIC v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.000599 0.6266 0.0004104 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2 0.003147 0.7082 0.0004104 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
## ---------------------------------------------------------------------------------------------------------
##
## [1] "c1 vs. c3 , number of degs significant at adj.p.val<0.5: 56"
##
## ----------------------------------------------------------------------------------------------------------
## Row.names c1 c3 adj.P.Val V2
## -------------------------------------------------- --------- ------- ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.0006292 -0.3199 1.525e-06 GM12878 RUNX3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1 0.001167 -0.2378 1.978e-06 GM12878 FOXM1 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1 6.767e-05 -0.1347 3.878e-06 GM12878 NFIC v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeSydhTfbsGm18951NfkbTnfaIggrabPk 0.008529 -0.6289 5.781e-06 GM18951 NFKB IgG-rab TNFa
## ChIP-seq Peaks from
## ENCODE/SYDH
##
## wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2 0.0001479 -0.3198 8.173e-06 GM12878 PML v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2 0.003147 -0.3812 8.605e-06 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1 0.0008332 -0.3287 8.605e-06 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeSydhTfbsGm19099NfkbTnfaIggrabPk 0.008903 -0.5609 9.529e-06 GM19099 NFKB IgG-rab TNFa
## ChIP-seq Peaks from
## ENCODE/SYDH
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep1 0.003957 -0.3773 9.887e-06 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.000599 -0.4904 1.692e-05 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
## ----------------------------------------------------------------------------------------------------------
##
## [1] "c1 vs. c4 , number of degs significant at adj.p.val<0.5: 55"
##
## -----------------------------------------------------------------------------------------------------------
## Row.names c1 c4 adj.P.Val V2
## -------------------------------------------------- --------- -------- ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.0006292 -0.4205 1.012e-06 GM12878 RUNX3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2 0.0001479 -0.243 3.42e-06 GM12878 PML v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeSydhTfbsGm18951NfkbTnfaIggrabPk 0.008529 -0.6439 3.42e-06 GM18951 NFKB IgG-rab TNFa
## ChIP-seq Peaks from
## ENCODE/SYDH
##
## wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1 0.001167 -0.5047 3.42e-06 GM12878 FOXM1 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1 6.767e-05 -0.2602 3.42e-06 GM12878 NFIC v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeSydhTfbsGm19099NfkbTnfaIggrabPk 0.008903 -0.5014 3.42e-06 GM19099 NFKB IgG-rab TNFa
## ChIP-seq Peaks from
## ENCODE/SYDH
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.000599 -0.3133 3.42e-06 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2 0.003147 -0.4064 3.42e-06 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1 0.0008332 -0.4112 4.748e-06 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Mta3sc81325V0422111PkRep2 2.011e-05 -0.09226 5.47e-06 GM12878 MTA3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
## -----------------------------------------------------------------------------------------------------------
## [1] "Counts of regulatory elements differentially associated with each group"
##
## ----------------------------
## c1 c2 c3 c4
## -------- ---- ---- ---- ----
## **c1** 0 54 56 55
##
## **c2** 0 0 0 0
##
## **c3** 0 0 0 0
##
## **c4** 0 0 0 0
## ----------------------------
Text mining question 2: Are the terms associated stronger with the diseases in one vs. the other cluster based on the literature strength? Are the terms themselves related based on the literature? Expected answer: Yes, the literature associations should confirm the relationships.
| C1 | C2 | C3 | C4 | |
|---|---|---|---|---|
| C1 | Cell types: Gm12878 Reg: NFkB, Pol2, MTA3, NFIC, NFATC1 | Cell types: Gm12878 Reg: NFkB, Pol2, MTA3, NFIC, NFATC1 | Cell types: Gm12878 Reg: NFkB, Pol2, MTA3, NFIC, NFATC1 | |
| C2 | Nothing significant | Nothing significant | ||
| C3 | Nothing significant | |||
| C4 |
Out of all regulatory datasets, we select only histone marks
## [1] 721 39
## [1] 610 39
We check how regulatory similarity correlates with overlap similarity.
## x y
## x 1.00 0.23
## y 0.23 1.00
##
## n= 1482
##
##
## P
## x y
## x 0
## y 0
Next, we visualize heatmap of regulatory similarity.
Text mining question 1: Are the diseases within a cluster share stronger literature similarity than the diseases between the clusters? To answer, we need literature similarity scores for each pair, then split the pairs into cluster-specific groups and compare score distributions with what can be expected by chance, calculating the p-values for it. Expected answer: Diseases within each cluster are related to each other by literature findings stronger than could be expected by chance. Diseases between the clusters are not related to each other by literature findings, and this also may be statistically significant.
The top 10 pairs of autoimmune-associated SNPs are most similar with each other.
##
## ---------------------------------------------------------------------------------------
## Disease 1 Disease 2 Corr. coefficient
## --------------------------------- --------------------------------- -------------------
## HDL_cholesterol Triglycerides 0.621
##
## Rheumatoid_arthritis Ulcerative_colitis 0.4856
##
## HDL_cholesterol LDL_cholesterol 0.48
##
## HDL_cholesterol Platelet_counts 0.4609
##
## Platelet_counts Triglycerides 0.4504
##
## LDL_cholesterol Triglycerides 0.4151
##
## Creatinine_levels Renal_function_related_traits_BUN 0.3915
##
## Psoriasis Systemic_lupus_erythematosus 0.3911
##
## Renal_function_related_traits_BUN Urate_levels 0.3689
##
## Alopecia_areata C_reactive_protein 0.3686
## ---------------------------------------------------------------------------------------
The similarity dendrogram can be divided into separate groups:
## Cluster01 has 6 members
## Celiac_disease
## Multiple_sclerosis
## Kawasaki_disease
## Primary_biliary_cirrhosis
## Systemic_lupus_erythematosus
## Psoriasis
##
## Cluster02 has 14 members
## Type_2_diabetes
## Fasting_glucose_related_traits
## Red_blood_cell_traits
## Crohns_disease
## Migraine
## Systemic_sclerosis
## Ankylosing_spondylitis
## Platelet_counts
## Triglycerides
## HDL_cholesterol
## Vitiligo
## Progressive_supranuclear_palsy
## Liver_enzyme_levels_gamma_glutamyl_transferase
## LDL_cholesterol
##
## Cluster03 has 11 members
## Allergy
## Type_1_diabetes
## Primary_sclerosing_cholangitis
## Juvenile_idiopathic_arthritis
## Behcets_disease
## Ulcerative_colitis
## Rheumatoid_arthritis
## Autoimmune_thyroiditis
## Alopecia_areata
## C_reactive_protein
## Asthma
##
## Cluster04 has 8 members
## Bone_mineral_density
## Chronic_kidney_disease
## Alzheimers_combined
## Restless_legs_syndrome
## Atopic_dermatitis
## Urate_levels
## Renal_function_related_traits_BUN
## Creatinine_levels
##
The “Enrichment 1/2” columns show the average p-values of the group-specific SNPs-regulatory associations. A “-” sign indicates that an association is underrepresented. The “p-value” column shows whether the difference in the associations bwtween the groups is statistically significantly different.
## [1] "c1 vs. c2 , number of degs significant at adj.p.val<0.5: 44"
##
## ------------------------------------------------------------------------------------------------------------
## Row.names c1 c2 adj.P.Val V2
## ----------------------------------------------- --------- -------- ----------- -----------------------------
## wgEncodeUwHistoneGm12875H3k04me3StdHotspotsRep1 0.0005212 -0.5773 5.236e-08 GM12875 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
##
## wgEncodeBroadHistoneGm12878H3k9acStdPk 3.849e-12 -0.205 8.601e-07 GM12878 H3K9ac Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k4me2StdPk 8.782e-09 -0.06392 8.601e-07 GM12878 H3K4me2 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep2 5.452e-07 -0.3131 1.346e-06 GM12865 H3K4me3 Histone Mod
## ChIP-seq Hotspots 2 from
## ENCODE/UW
##
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep1 2.881e-06 -0.3862 7.309e-06 GM12865 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
##
## wgEncodeUwHistoneGm12864H3k04me3StdHotspotsRep2 0.003202 -0.7181 9.149e-06 GM12864 H3K4me3 Histone Mod
## ChIP-seq Hotspots 2 from
## ENCODE/UW
##
## wgEncodeBroadHistoneDnd41H3k09acPk 0.0001527 -0.6869 1.924e-05 Dnd41 H3K9ac Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k04me3StdPkV2 4.708e-08 -0.167 2.222e-05 GM12878 H3K4me3 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k79me2StdPk 1.867e-08 -0.9988 3.557e-05 GM12878 H3K79me2 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k04me1StdPkV2 6.263e-15 -0.01015 3.557e-05 GM12878 H3K4me1 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
## ------------------------------------------------------------------------------------------------------------
##
## [1] "c1 vs. c3 , number of degs significant at adj.p.val<0.5: 54"
##
## -----------------------------------------------------------------------------------------------------------
## Row.names c1 c3 adj.P.Val V2
## ----------------------------------------------- --------- ------- ----------- -----------------------------
## wgEncodeUwHistoneGm12875H3k04me3StdHotspotsRep1 0.0005212 0.8142 1.464e-06 GM12875 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
##
## wgEncodeBroadHistoneGm12878H3k9acStdPk 3.849e-12 0.2791 2.209e-05 GM12878 H3K9ac Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep2 5.452e-07 0.6608 3.836e-05 GM12865 H3K4me3 Histone Mod
## ChIP-seq Hotspots 2 from
## ENCODE/UW
##
## wgEncodeBroadHistoneGm12878H3k4me2StdPk 8.782e-09 0.5631 5.197e-05 GM12878 H3K4me2 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k79me2StdPk 1.867e-08 -0.4995 7.28e-05 GM12878 H3K79me2 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep1 2.881e-06 0.7926 7.28e-05 GM12865 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
##
## wgEncodeUwHistoneGm12864H3k04me3StdHotspotsRep2 0.003202 0.9054 7.28e-05 GM12864 H3K4me3 Histone Mod
## ChIP-seq Hotspots 2 from
## ENCODE/UW
##
## wgEncodeBroadHistoneDnd41H3k09acPk 0.0001527 -0.9118 7.28e-05 Dnd41 H3K9ac Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneDnd41H3k04me1Pk 9.105e-08 -0.7878 0.0001171 Dnd41 H3K4me1 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeUwHistoneGm06990H3k4me3StdHotspotsRep1 0.0001636 -0.9922 0.0005313 GM06990 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
## -----------------------------------------------------------------------------------------------------------
##
## [1] "c1 vs. c4 , number of degs significant at adj.p.val<0.5: 55"
##
## -------------------------------------------------------------------------------------------------------------
## Row.names c1 c4 adj.P.Val V2
## ----------------------------------------------- --------- --------- ----------- -----------------------------
## wgEncodeUwHistoneGm12875H3k04me3StdHotspotsRep1 0.0005212 -0.3919 2.056e-07 GM12875 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
##
## wgEncodeBroadHistoneGm12878H3k4me2StdPk 8.782e-09 -0.02811 4.143e-06 GM12878 H3K4me2 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k9acStdPk 3.849e-12 -0.1383 4.143e-06 GM12878 H3K9ac Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k79me2StdPk 1.867e-08 -0.005656 5.047e-06 GM12878 H3K79me2 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep2 5.452e-07 -0.2947 8.699e-06 GM12865 H3K4me3 Histone Mod
## ChIP-seq Hotspots 2 from
## ENCODE/UW
##
## wgEncodeBroadHistoneGm12878H3k04me3StdPkV2 4.708e-08 -0.01621 2.43e-05 GM12878 H3K4me3 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeUwHistoneGm12864H3k04me3StdHotspotsRep2 0.003202 -0.5261 2.535e-05 GM12864 H3K4me3 Histone Mod
## ChIP-seq Hotspots 2 from
## ENCODE/UW
##
## wgEncodeUwHistoneGm12865H3k04me3StdHotspotsRep1 2.881e-06 -0.3212 2.572e-05 GM12865 H3K4me3 Histone Mod
## ChIP-seq Hotspots 1 from
## ENCODE/UW
##
## wgEncodeBroadHistoneDnd41H3k09acPk 0.0001527 -0.3194 2.572e-05 Dnd41 H3K9ac Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneDnd41H3k04me1Pk 9.105e-08 -0.06965 3.248e-05 Dnd41 H3K4me1 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
## -------------------------------------------------------------------------------------------------------------
##
## [1] "c2 vs. c3 , number of degs significant at adj.p.val<0.5: 18"
##
## -------------------------------------------------------------------------------------------------------
## Row.names c2 c3 adj.P.Val V2
## ------------------------------------------ -------- --------- ----------- -----------------------------
## wgEncodeBroadHistoneA549H3k79me2Dex100nmPk 0.008678 -0.02115 0.01579 A549 DEX 100 nM H3K79me2
## Histone Mods by ChIP-seq
## Peaks from ENCODE/Broad
##
## wgEncodeBroadHistoneHsmmH3k27me3StdPk -0.02327 3.654e-07 0.02165 HSMM H3K27me3 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneNhaH3k27me3StdPk -0.01506 0.0001926 0.02689 NH-A H3K27me3 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneA549H3k36me3Dex100nmPk 0.1047 -0.004845 0.02689 A549 DEX 100 nM H3K36me3
## Histone Mods by ChIP-seq
## Peaks from ENCODE/Broad
##
## wgEncodeBroadHistoneNhlfH3k79me2Pk 0.005332 -0.05179 0.03198 NHLF H3K79me2 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneK562H3k36me3StdPk 0.001587 -0.009793 0.03505 K562 H3K36me3 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneHsmmtH3k09me3Pk -0.02596 0.003835 0.04333 HSMMtube H3K9me3 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneA549H3k27me3Etoh02Pk -0.01807 0.002645 0.04472 A549 EtOH 0.02% H3K27me3
## Histone Mods by ChIP-seq
## Peaks from ENCODE/Broad
##
## wgEncodeBroadHistoneHsmmtH3k27me3Pk -0.01922 0.001151 0.04472 HSMMtube H3K27me3 Histone
## Mods by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneNhdfadH4k20me1Pk 0.001681 -0.02737 0.04472 NHDF-Ad H4K20me1 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
## -------------------------------------------------------------------------------------------------------
##
## [1] "c2 vs. c4 , number of degs significant at adj.p.val<0.5: 5"
##
## -----------------------------------------------------------------------------------------------------
## Row.names c2 c4 adj.P.Val V2
## ---------------------------------------- ------- ---------- ----------- -----------------------------
## wgEncodeBroadHistoneNhdfadH3k36me3StdPk 0.1009 -0.005215 0.02027 NHDF-Ad H3K36me3 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneNhekH3k9me1StdPk 0.2002 -0.001186 0.02253 NHEK H3K9me1 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneHmecH3k36me3StdPk 0.01773 -0.0002847 0.04273 HMEC H3K36me3 Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneGm12878H3k36me3StdPk 0.2219 -0.001063 0.05838 GM12878 H3K36me3 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
##
## wgEncodeBroadHistoneK562NcorPk 0.04478 -0.005174 0.0846 K562 NCoR Histone Mods by
## ChIP-seq Peaks from
## ENCODE/Broad
## -----------------------------------------------------------------------------------------------------
## [1] "Counts of regulatory elements differentially associated with each group"
##
## ----------------------------
## c1 c2 c3 c4
## -------- ---- ---- ---- ----
## **c1** 0 44 54 55
##
## **c2** 0 0 18 5
##
## **c3** 0 0 0 0
##
## **c4** 0 0 0 0
## ----------------------------
Text mining question 2: Are the terms associated stronger with the diseases in one vs. the other cluster based on the literature strength? Are the terms themselves related based on the literature? Expected answer: Yes, the literature associations should confirm the relationships.
| C1 | C2 | C3 | C4 | |
|---|---|---|---|---|
| C1 | Cell types: Gm12878, CD20+ Reg: H3K4me1, H3K9me3, H3K9ac, H3K27ac, H2az, H3K4me2 | Cell types: Gm12878, CD20+ Reg: H3K4me1, H3K9me3, H3K9ac, H3K27ac, H2az, H3K4me2 | Cell types: Gm12878, CD20+ Reg: H3K4me1, H3K9me3, H3K9ac, H3K27ac, H2az, H3K4me2 | |
| C2 | Cell types: K562, NHEK, NHDF-Ad, NH-A, HMEC Reg: H3K36me3, H4K20me1, H3K79me2 | Nothing significant | ||
| C3 | Nothing significant | |||
| C4 |
Out of all regulatory datasets, we select all. The goal here is to get potentially tighter clustering.
## [1] 4498 39
## [1] 2969 39
We check how regulatory similarity correlates with overlap similarity.
## x y
## x 1.00 0.33
## y 0.33 1.00
##
## n= 1482
##
##
## P
## x y
## x 0
## y 0
Next, we visualize heatmap of regulatory similarity.
The top 10 pairs of disease-associated SNPs are most similar with each other.
##
## --------------------------------------------------------------------------------------------
## Disease 1 Disease 2 Corr. coefficient
## ---------------------------------------------- ------------------------- -------------------
## HDL_cholesterol Triglycerides 0.473
##
## LDL_cholesterol Triglycerides 0.4314
##
## Chronic_kidney_disease Urate_levels 0.3742
##
## HDL_cholesterol LDL_cholesterol 0.3475
##
## Bone_mineral_density Type_2_diabetes 0.3225
##
## Multiple_sclerosis Primary_biliary_cirrhosis 0.316
##
## Alzheimers_combined Type_2_diabetes 0.2999
##
## Liver_enzyme_levels_gamma_glutamyl_transferase Urate_levels 0.2976
##
## Fasting_glucose_related_traits Type_2_diabetes 0.2972
##
## Liver_enzyme_levels_gamma_glutamyl_transferase Platelet_counts 0.2944
## --------------------------------------------------------------------------------------------
The similarity dendrogram can be divided into separate groups:
## Cluster01 has 14 members
## Platelet_counts
## Liver_enzyme_levels_gamma_glutamyl_transferase
## Red_blood_cell_traits
## LDL_cholesterol
## HDL_cholesterol
## Triglycerides
## Type_2_diabetes
## Fasting_glucose_related_traits
## Bone_mineral_density
## Alzheimers_combined
## Creatinine_levels
## Renal_function_related_traits_BUN
## Urate_levels
## Chronic_kidney_disease
##
## Cluster02 has 9 members
## Multiple_sclerosis
## Kawasaki_disease
## Celiac_disease
## Systemic_lupus_erythematosus
## Psoriasis
## Ulcerative_colitis
## Rheumatoid_arthritis
## Crohns_disease
## Autoimmune_thyroiditis
##
## Cluster03 has 5 members
## Primary_biliary_cirrhosis
## Ankylosing_spondylitis
## Systemic_sclerosis
## Migraine
## Primary_sclerosing_cholangitis
##
## Cluster04 has 11 members
## Juvenile_idiopathic_arthritis
## Atopic_dermatitis
## Alopecia_areata
## C_reactive_protein
## Allergy
## Type_1_diabetes
## Vitiligo
## Behcets_disease
## Progressive_supranuclear_palsy
## Restless_legs_syndrome
## Asthma
##
The “Enrichment 1/2” columns show the average p-values of the group-specific SNPs-regulatory associations. A “-” sign indicates that an association is underrepresented. The “p-value” column shows whether the difference in the associations bwtween the groups is statistically significantly different.
## [1] "c1 vs. c2 , number of degs significant at adj.p.val<0.5: 116"
##
## ----------------------------------------------------------------------------------------------------------------
## Row.names c1 c2 adj.P.Val V2
## -------------------------------------------------- ---------- --------- ----------- ----------------------------
## wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1 -0.0696 0.0001995 1.664e-05 GM12878 NFIC v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1 -0.1765 0.002657 1.664e-05 GM12878 FOXM1 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 -0.2057 0.00165 1.664e-05 GM12878 RUNX3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeOpenChromFaireGm12892Pk -0.2289 0.001429 2.372e-05 GM12892 FAIRE Peaks from
## ENCODE/OpenChrom(UNC)
##
## wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2 -0.1244 0.0004603 2.849e-05 GM12878 PML v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Mta3sc81325V0422111PkRep2 -0.04666 6.729e-05 3.617e-05 GM12878 MTA3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 -0.2152 0.001387 3.617e-05 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1 -0.1962 0.002065 3.617e-05 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep1 -0.1173 0.001776 4.267e-05 GM12878 RUNX3 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeBroadHistoneGm12878H3k9me3StdPk -2.144e-06 7.237e-08 4.636e-05 GM12878 H3K9me3 Histone Mods
## by ChIP-seq Peaks from
## ENCODE/Broad
## ----------------------------------------------------------------------------------------------------------------
##
## [1] "c1 vs. c3 , number of degs significant at adj.p.val<0.5: 1"
##
## ---------------------------------------------------------------------------------------------------
## Row.names c1 c3 adj.P.Val V2
## ---------------------------------------- ------- --------- ----------- ----------------------------
## wgEncodeBroadHistoneMonocd14ro1746CtcfPk -0.6355 4.879e-06 0.01794 Monocytes CD14+ CTCF Histone
## Mods by ChIP-seq Peaks from
## ENCODE/Broad
## ---------------------------------------------------------------------------------------------------
##
## [1] "c2 vs. c4 , number of degs significant at adj.p.val<0.5: 76"
##
## ----------------------------------------------------------------------------------------------------------
## Row.names c2 c4 adj.P.Val V2
## -------------------------------------------------- --------- ------- ----------- -------------------------
## wgEncodeHaibTfbsGm12878Runx3sc101553V0422111PkRep2 0.00165 0.8762 0.005057 GM12878 RUNX3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Nficsc81335V0422111PkRep1 0.0001995 0.8082 0.005057 GM12878 NFIC v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeOpenChromFaireGm12892Pk 0.001429 0.8088 0.005057 GM12892 FAIRE Peaks from
## ENCODE/OpenChrom(UNC)
##
## wgEncodeHaibTfbsGm12878Foxm1sc502V0422111PkRep1 0.002657 0.8098 0.005057 GM12878 FOXM1 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Pmlsc71910V0422111PkRep2 0.0004603 0.8089 0.005801 GM12878 PML v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep1 0.007786 -0.9993 0.00689 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Mta3sc81325V0422111PkRep2 6.729e-05 -0.9847 0.00689 GM12878 MTA3 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Stat5asc74442V0422111PkRep2 0.001387 0.6498 0.00689 GM12878 STAT5A v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep1 0.002065 0.7675 0.00689 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 1 from
## ENCODE/HAIB
##
## wgEncodeHaibTfbsGm12878Atf2sc81188V0422111PkRep2 0.007315 0.8856 0.00899 GM12878 ATF2 v042211.1
## ChIP-seq Peaks Rep 2 from
## ENCODE/HAIB
## ----------------------------------------------------------------------------------------------------------
##
## [1] "c3 vs. c4 , number of degs significant at adj.p.val<0.5: 1"
##
## --------------------------------------------------------------------------------------------------
## Row.names c3 c4 adj.P.Val V2
## ---------------------------------------- --------- ------ ----------- ----------------------------
## wgEncodeBroadHistoneMonocd14ro1746CtcfPk 4.879e-06 0.7054 0.08407 Monocytes CD14+ CTCF Histone
## Mods by ChIP-seq Peaks from
## ENCODE/Broad
## --------------------------------------------------------------------------------------------------
## [1] "Counts of regulatory elements differentially associated with each group"
##
## ----------------------------
## c1 c2 c3 c4
## -------- ---- ---- ---- ----
## **c1** 0 116 1 0
##
## **c2** 0 0 0 76
##
## **c3** 0 0 0 1
##
## **c4** 0 0 0 0
## ----------------------------
The picture is not as good as when we are taking subsets of regulatory datasets.
| c1 | c2 | c3 | c4 | |
|---|---|---|---|---|
| ## Cluster01 has 14 members ## Platelet_counts ## Liver_enzyme_levels_gamma_glutamyl_transferase ## Red_blood_cell_traits ## LDL_cholesterol ## HDL_cholesterol ## Triglycerides ## Type_2_diabetes ## Fasting_glucose_related_traits ## Bone_mineral_density ## Alzheimers_combined ## Creatinine_levels ## Renal_function_related_traits_BUN ## Urate_levels ## Chronic_kidney_disease | 416 total up in C2 Cell types: Gm, B cells Factors: NFIC, FOXM1, RUNX3, CEBPB and other TFBSs; DNAse HS | 1 total up in C3 Cell types: Monocytes CD14+ Factors: CTCF | | | ## Multiple_sclerosis ## Kawasaki_disease ## Celiac_disease ## Systemic_lupus_erythematosus ## Psoriasis ## Ulcerative_colitis ## Rheumatoid_arthritis ## Crohns_disease ## Autoimmune_thyroiditis | | | | 96 total up in C2 Cell types: B cells, Gm Factors: RUNX3, NFIC, FOXM1 and other TFBSs, DNAse HS | |||
| ## Primary_biliary_cirrhosis ## Ankylosing_spondylitis ## Systemic_sclerosis ## Migraine ## Primary_sclerosing_cholangitis | 1 total up in C3 Cell types: Monocytes CD14+ Factors: CTCF | |||
| ## Juvenile_idiopathic_arthritis ## Atopic_dermatitis ## Alopecia_areata ## C_reactive_protein ## Allergy ## Type_1_diabetes ## Vitiligo ## Behcets_disease ## Progressive_supranuclear_palsy ## Restless_legs_syndrome ## Asthma |
To evaluate whether regulatory and co-morbidity measurements correlate, a matrix of disease-disease co-morbidity correlations (AllNet3.txt) is downloaded.
We create square matrixes (14x14) of disease-disease co-morbitity correlations and regulatory correlations.
To evaluate correlation between the two methods of measurements, the matrixes are correlated with each other. A matrix of correlation coefficients, a total number of pairs used for correlation measurement, and a matrix of p-values are outputted.
The ongoing debate is whether to remove or keep self-self associations.
## [1] "Co-occurrence"
## x y
## x 1.0 0.3
## y 0.3 1.0
##
## n= 1521
##
##
## P
## x y
## x 0
## y 0
## [1] "Relative risk"
## x y
## x 1.00 0.38
## y 0.38 1.00
##
## n= 1521
##
##
## P
## x y
## x 0
## y 0
## [1] "Phi-correlation"
## x y
## x 1.00 0.41
## y 0.41 1.00
##
## n= 1521
##
##
## P
## x y
## x 0
## y 0
The regulatory and co-morbidity-based (Phi-correlations) disease-disease correlations correlate with each other at Pearson’s correlation coefficient of 0.54 (when keeping self-correlations, p-value = 0). Using “relative risk” co-morbidity correlations produces similar results.
## [1] "sharedRels correlation with episimilarity"
##
## ---------------------
## x y
## ------- ------ ------
## **x** 1 0.3701
##
## **y** 0.3701 1
## ---------------------
##
## [1] "obsExp correlation with episimilarity"
##
## ---------------------
## x y
## ------- ------ ------
## **x** 1 0.5449
##
## **y** 0.5449 1
## ---------------------
##
## [1] "directStr correlation with episimilarity"
##
## ---------------------
## x y
## ------- ------ ------
## **x** 1 0.2504
##
## **y** 0.2504 1
## ---------------------
##
## [1] "relOverlap correlation with episimilarity"
##
## ---------------------
## x y
## ------- ------ ------
## **x** 1 0.3825
##
## **y** 0.3825 1
## ---------------------
##
## [1] "misn correlation with episimilarity"
##
## -------------------
## x y
## ------- ----- -----
## **x** 1 0.725
##
## **y** 0.725 1
## -------------------